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MAPPING INDIVIDUAL LOGICAL PROCESSES IN INFORMATION SEARCHING 


I ntroduct i on 


Ce r+a in individuals display an un us ua I ab i 1 i Ty To find spec i f i c in— 
formation quickly. in large collections such as libraries. Perhaps if one 
understood sufficiently well the complex logical processes by which these 
individuals operate, he might formalize these processes somewhat and so 
enhance the effectiveness of other information searchers. Experience in 
many fields of science teaches, however, that careful measurement of a 
phenomenon usually precedes its understanding and description. With this 
precept in mind, the acquisition of a computer terminal with attached 
printer (the terminal being connected to a computer programmed to search 
a large file in an interactive fashion) seemed to offer an opportunity to 
chart in a quantitative fashion the logical processes used by various in- 
dividuals in isolating specific information from a very large set. 

Accordingly, an experiment was designed around this terminal equip- 
ment and a number of tests were conducted. This is a report on the experi- 
ment and the. results which were obtained. The results seem to be suffi- 
ciently specific and informative that one may recommend the procedure, or 
variants thereof, as a tool for acquiring a statistically significant num- 
ber of logic mappings from which some universal laws perhaps can be deduced. 
This report, however, is intended to describe only the mapping process 
itself and to illustrate it by one example rather than to present a de- 
tailed analysis of the effectiveness of various search techniques as re- 
vealed by maps of their logical processes. 


EXPERIMENT DESIGN 


The terminal was connected by telephone line to a computer located in 
College Park, Maryland, on which is mounted the entire NASA scientific in- 
formation col lection. # This collection, dating from 1962, contains approxi- 
mately 750,000 items. Each item is indexed under an average of 15 terms ** 
The number of separate index terms recognized by the computer is somewhat' in 
excess of 18,000. The collection covers a very broad spectrum of scientific 
d r sc I p I i nes. 


An item may be a patent or a computer program instruction manual as well 
as those documents cited in the next paragraph. 

An index term. is a. word or phrase describing a concept. For example, a 
document dealing with psychological testing may be indexed under such 
terms as "psychometrics," "testing," etc. 



Some disciplines, of course, are represented more extensively in the 
collection than others. Items come from both domestic and foreign sources. 
Approximately half of the items are report- type literature and the other 
half are from the journal literature. 

The report literature contains documents from government laboratories, 
government contractors, translations of foreign reports, doctoral disser- 
tations, university reports, etc. Some 1800 scientific and technical 
journals from all over the world are scanned for relevant items to add to 
the collection. For many problems in the physical sciences, the NASA 
collection is the most accessible one available. For suitable problems, 
it permits the searcher to exercise his knowledge, experience, and in- 
genuity to a considerable extent in order to improve his recall of rele- 
vant documents. The flexibility it offers makes it a good vehicle to use 
for studying the logical processes a searcher employs in going about the 
task of identifying the relevant documents in a large collection and the 
differences in the logical tacks taken by various searchers. 

At the terminal is located an instruction manual which the user may 
consult to refresh his memory on the operations needed to access specific 
information in the system. The system will accept commands to display: 

1. A portion of the list of accepted terms, in an alphabetical 
order with 5 terms prior to the specified term and up to 37 
subsequent terms, if desired. This is termed an expansion 
about the specific term. Beside each term will be indicated 
the total number of documents in the system indexed under 
that term. 

2. A hierarchiai list of terms related by subject to the specified 
term. This is called a thesaurus expansion. The number of docu- 
ments posted to each term is also shown. 

3. The accession number or identifying number of every document 
indexed under a given term. 

4. A full bibliographic citation of every document indexed under 
a given term. 

5. An alphabetical list of authors of documents in the system with 
the number of documents by each listed beside the name. The 
author's name can be used to obtain the accession numbers or 
full bibliographic citation of all documents associated with 
that name. 

6. The same can be done for corporate source or contract number, 
but these are seldom used. 

The system can also be commanded to search all of the documents posted 
under a given term for the presence in the indexing of a second (or third, 
etc.) term. The occurrence of a second desired term can be made the basis 
either to select or reject the document. If the document is selected on 
this basis it may be said to have been selected through a logical "and" 
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process, i.e., the document must be indexed under both term A and term B. 

If the document is rejected, the process is termed 3 logical "nor" or "not" 
process. The reader may quickly determine that this capability permits 
literature searches of very elaborate strategy to be constructed. On the 
other hand, the indexing is assigned manually. This means that in addition 
to the general rules usual ly observed by the indexers (which are soon 
apparent to the veteran searcher) there are also anomalies caused by mis- 
understanding of the document by the indexer, inconsistency from day to 
day, the desire to limit the number of index terms to around 15, or by 
simple human error. As a result, it is usually not productive to write 
searches of great complexity. Requiring that the indexing of a particular 
document contain more than two desired terms usually results in unacceptably 
small output. 

This, then, is the system with which the human searcher interacts. He 
is put before a cathode ray tube display and a typewriter keyboard. He 
communicates his thoughts to the computer through this keyboard. The CRT 
screen displays his question, the system’s response, questions from the 
system to him, and his response. A printer alongside is rigged to record 
the entire "conversation" including clock time for the process. The searcher 
is handed a statement of the problem and asked to go to work. The room is 
semi-sealed against disturbance. Only if the searcher has trouble making 
himself understood to the system (which sometimes happened with inex- 
perienced searchers) or encountered technical difficulty (which occurs at 
random times) does he emerge and ask assistance of monitoring personnel. 

The search problem selected to illustrate the capabilities of such a 
procedure to map thought processes was one that was received during the 
normal course of business at the North Carolina Science and Technology 
Research Center. It had proven to be particularly troublesome because the 
400 or so relevant documents in the collection suffered from indexing 
vagaries that made their retrieval difficult without, at the same time, 
retrieving a large quantity of extraneous material. In such a problem, 
it was felt, a searcher could really demonstrate his skill and knowledge. 

The search problem statement is reproduced in the Appendix. The printed 
record of the "conversation" with the computer was given to the experiment 
director along with the "hit" list the searcher felt was his best effort. 
Abstracts for ali of those "hits" were examined by the experiment director 
who decided which of the hits were relevant. This was done to determine 
what portion of the searcher's hit list was pertinent and how many of the 
pertinent documents in the entire collection the search identified. 

The printed records of the "conversations" with the system were then 
prepared in the form of a standardized flow chart to facilitate Identifying 
the approach and results of each searcher. In the process of preparation, 
mistakes which elicited responses from the system such as "Invalid Command 
Proceed" were eliminated as were garbled conversations resulting from system 
failures. Reproduced below as Figure I is the flow chart for a search per- 
formed by an undergraduate engineering student with a fair knowledge of 
system operation and indexing problems. The search shown in Figure I re- 
quired the searcher to remain at the terminal approximately three hours. 
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LOGICAL PROCESSES EMPLOYED BY THE SEARCHER AS 
DEDUCED FROM HIS SEARCH 


In order to follow the logical processes depicted in Figure I, one 
should note that the following symbolism is employed: 

(1) A rectangle indicates the term which was the basis for a 
fol lowing operation . 

(2) In the upper left corner of the rectangle is a symbol de- 
noting the source of the term: 

m from his mind 

e from a previous expansion 

t from a thesaurus expansion 

c from a document's bibliographic citation 

(3) A rectangle with curved sides attached indicates what the 
searcher saw on the CRT screen, e.g., a "display." 

(4) The elongated hexagon denotes an operation, e.g.. expand 
"thermal." 

(5) The diamond indicates a decision which can be answered yes 
or no. 

(6) The circle Indicates premature termination of an operation. 

(7) Operations, display, etc., are arranged in chronological 
order with arrows leading from the proper antecedants and 
to the point in the search where subsequent operations are 
performed. 

(8) Where it is necessary to break a I i ne at the bottom of a 
column, an identifying letter is inserted so that when the 
reader again sees that letter, he knows the line of which 
it is a continuation. 

(9) Lines to the right of terms in a display indicate that the 
searcher selected those terms to form sets. 

The searcher began by requesting an expansion about the term he 
sidered to be basic to the whole search, the term "thermal." On the 
of the first display + he selected two terms and felt that these were 
cientjy pertinent for him to continue the displays until the display 
was reached. From this expansion he selected a total of 9 terms and 

* Display refers to the material presented at a given time on the face of 
the cathode ray tube viewing device. 

++ Approximately 14 additional terms are displayed on the CRT each time the 
command "more" is given' until the limit of 41 terms has been reached. 


con- 
basis 
suf f i - 
I imit ++ 
formed 
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8 sets. +++ Note that he is assured of a large number of documents upon which 
he could perform further logical operations by the fact that the term "thermal" 
is posted to 10,954 documents. 

One of the terms displayed in the expansion, "thermal emission," seemed, 
on the basis of the problem statement and his general knowledge of the sub- 
ject area, to be worth examining further. The searcher therefore requested 
a thesaurus expansion from which he selected three terms. He then combined 
these to form set 9. 

To get another start on the problem, the searcher went back to the cen- 
tral idea and selected a broad but related term, "temperature," on which to 
request an expansion. He knew from previous experience that asking the same 
question in several ways is often necessary to assure that documents dealing 
with a particular subject (but indexed by different individuals using similar 
but not identical terms) are retrieved. Unfortunately, he misspelled it 
"temperture." The display indicated no postings under this term. The 
searcher nevertheless did find what he considered to be a useful term in 
this display. He then decided to overcome his spelling deficiency by using 
first the root of the term, "tempera." This produced the desired result 
and he again displayed all the terms up to the limit, selecting 12 to form 
10 additional sets. 

Next, he chose to perform a thesaurus expansion on one of the terms 
which appeared during the previous display, "temperature measurement." 

This term, of course, is specifically what the search is about. Since 
the measurement was to be carried out by observing the radiation, he chose 
"radiation pyrometer" as a set. 

Because the search statement had contained the term "radiometer," he 
then decided to see what useful terms might be alphabetically related to 
this. The expansion was reasonable productive but he decided on viewing 
the first display that a thesaurus expansion would be more appropriate. 

From this he selected seven terms. 

He then went back to the general concept of "measurement" and per- 
formed a thesaurus expansion which yielded a number of useful terms. 


t+t A set is a group of documents labeled by one or more specific terms 
or an ensemble of such groups. It is created by the searcher be- 
cause of his desire to perform some further operation with this group 
of documents. Each successive group of documents, or set, is assigned 
a number in order by the computer. 

* 

This is to say that the index term "thermal" has been assigned to 
10,954 documents in the collection. To speak of "postings" means 
that x number of documents are . ass igned , or posted to, a specific 
index term. 
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in the general sense the concepts of "temperature" and "heat" are 
closely related. Seeking to formulate the idea of "temperature measure- 
ment by detection of radiant heat flux" in other terms, he chose to ex- 
pand the general term "heat." Only three of the terms displayed, he felt, 
added significantly to those he had already selected. 

He then approached the idea of measuring the temperature distribution 
over a surface as a form of mapping and sought to determine whether any of 
the related terms would be useful. From the thesaurus expansion he selected 
only three. 

He then tried a different approach on the idea of measurement by using 
the term "sensing." This appeared to be quite productive. At this point 
he decided to winnow down the number of documents to those more pertinent. 

He combined those sets dealing with temperature measurement to get a single 
set, number 39. He also combined sets dealing with radiometers to obtain 
set 40 and sets dealing with mapping and sensing to obtain set 41. The 
sets dealing with temperature in a general way were i ntersected** with set 
41 to give a new set, 42, which contained 465 documents indexed under both 
the idea of temperature and that of mapping. The general idea of measure- 
ment was then combined with set 41. The new set, number 43, was inter- 
sected with sets on heat and radiation measurement to obtain 1443 documents 
dealing specifically with radiation mapping. Set 43 was also intersected 
with sets 39 and 40. The result was a set of 1473 documents dealing with 
mapping by radiometer. 

In this process, one term with only two postings caught his eye: 
sensor for airborne terrain analysis." He decided therefore to display 
the two citations to see if they were relevant. They were not. 

Up to this point, he had established that the documents in sets 39, 

40, 42, 44, and 45 with some 9000 postings seemed to bear on the problem 
of measuring or mapping radiation fluxes over large areas. Since he was 
specifically interested in the emission from water, he developed some ex- 
pansions about the idea, of liquids. Except for the general terms them- 
selves, the results did not appear to him to be particularly promising. 

He then began to explore the idea of surfaces and assembled ail the docu- 
ments (495) dealing with the idea of liquid surfaces. 

Backtracking for a moment, he intersected the idea of radiometers 
(set 40) with the concepts of temperature measurement, temperature mapping, 
and radiation mapping (sets 39, 42, and 44). This produced set 53 with 347 
accessions. Ultimately, he decided that set 53 represented his best effort 
on the question, but at the time, he decided to try one more tack to see 
if some useful documents might show up. 


To intersect means to require that the indexing for a particular 
document contain both a term from list "A" as well as a term from 
I ist "B." 
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'He had not yet investigated the possibi lities of approaching the 
question from the point of view of sensing from an aerial platform, which 
is how this job was to be done in the field. He began to travel this path 
by asking for an expansion on airborne. Although he selected one term, 
the expansion did not appear to be too fruitful. He then made a thesaurus 
expansion on the selected term without gaining what he considered to be 
additional insight. At this point, he tried an expansion on a general, 
but related, term, "aerial." Again he got only one term which he felt 
might be useful, so that he decided not to continue the expansion. 

The two terms selected from this effort he combined to create set 56. 

He was obviously undecided as to what to do with it at the moment because 
he then intersected set 53 with set 52 (dealing with liquid surfaces) and 
got only 4 documents (set 57). This, of course, is too few so he then 
tried searching the 2956 accessions in set 56 for index terms also in set 
53 (by performing a logical intersection or "and"). The gave set 58 with 
31 documents. 

To see if he were in the right area in his selections, he asked for 
two of the citations from set 57. These looked to be very pertinent, so 
he intersected set 57 with set 58. Since three of the four documents in 
set 57 also appeared in set 58, he felt that set 58 probably was also 
quite pertinent. He therefore asked that complete bibliographic infor- 
mation on the documents in 57 and 58 be printed. Since both were subsets 
of 53, he felt that 53 would probably also contain a large amount of rele- 
vant data. Finally, then, he asked that all the accession numbers in set 
53 be printed. 

As a matter of interest, it may be noted that ultimately 388 separate 
relevant documents were found in the system through a total of eight searches 
by different individuals. The particular search discussed here found 121 
or 3 1 . 2% of these. Of the 347 hits designated by the searcher as consti- 
tuting his search results, 34.9$ were relevant. This is a very good per- 
formance for this type of question (i.e., one without highly specific 
indexing). 

The foregoing is a recitation of the specific steps taken at each point 
in the search process by the searcher as he developed his list of relevant 
documents. It was deduced from the printed record of the searcher T s con- 
versation with the computer in the form of the flow chart (Figure I). 

Perhaps, however, the thought processes involved here can be better under- 
stood by viewing the search In somewhat more general terms. The searcher 
seems to have operated Implicitly under these guidelines: 

I. Attack the problem in as many different ways as you can 
reasonably conceive. Note that in this search he employed 
at least 4 distinct approaches. 
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2. Employ intersections between rather general terms or strings 
of related terms in addition to more specific terms to iden- 
tify pertinent documents. 

3. Do not be afraid to use a large number of terms. This search 
used about 75. 

4. Examine some of your intermediate output from time to time to 
see if you are on the right track. 

5. The number of pertinent documents retrieved is related in a 
general way to the time spent in pursuing the search; thus, 
keep on thi nki ng. 
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CONCLUDING REMARKS 


The example presented above illustrates how one can use interactive 
computer, recording, and flow charting techniques in order to map human 
thought processes rationally and in considerable detail. It seems reason- 
able to suggest, then, that we might adapt these techniques to measuring 
thought processes in other areas of human endeavor. In particular, they 
seem adaptable to those thought processes which involve a cycle of obser- 
vation, decision or action, and revelation of the consequences of that 
decision or action. Analysis of a large number of such measurements in 
a variety of problem areas and on a variety of individuals should be 
quite beneficial in formulating both specific and general educational 
policies and methods. 
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APPENDIX 




Airborne Thermal Mapping of Vlater Surfaces 


A consulting firm is bidding on a contract to map 
the water temperature in lakes which are used to cool 
thermal power plants. The mapping will be done by fly- 
ing over in an airplane (3000 1 - 6000 r above lake sur- 
face) , observing the lake with a radiometer , recording 
the radiometer output , and then processing the data so 
as to be able to draw lines of constant temperature on 
cm accurate outline of the lake. The purpose , of course , 
is to make it possible to check compliance with thermal 
pollution standards easily and quickly. 

Specifications call for measuring accuracies of 

0.5°F. The consulting firm thinks that 1°F. is about 
all that they can do realistically. So, they would like 
to know (a) does any responsible group claim O.S°F. 
accuracy? (b) If so, what 

1. sensor technology 

2. mathematical modeling techniques 

3. data reduction methods 

4. computerized data processing techniques 

do they employ? 


NOTE: A radiometer is a device which senses the heat 

radiated from the surface of a body. It can be focused 
so that it "looks" at a relatively small area. The 
atmosphere absorbs some of this radiation , the absorp- 
tion being dependent upon air temperature , density, and 
distance the radiation travels. Therefore, the angle at 
which the radiometer looks at the surface from the air- 
plane determines how much air the heat waves have to go 
through , what its density is, and how it's "layered." 

From this knowledge and a mathematical model of the 
absorption and dispersion of infrared radiation by a vari- 
able density atmosphere, one can arrive at an estimate of 
the surface temperature in terms of the radiometer indi- 
cation. Also, a lake is only a semi-opaque body, which 
means that not all of the radiation that one detects is 
from the layer immediately below the surface. These 
effects are also included in the mathematical model. 



The radiometer output is an electrical signal pro- 
portional to the intensity of the thermal radiation and 
its spectral distribution (how much energy is radiated 
at each wavelength) . One must therefore have a procedure 
for converting this signal to temperature . Ideally, the 
whole process could be one in which the user merely sub- 
mits a magnetic tape recording of radiometer output look- 
ing at a given part of the lake to a computer program . 

The program reduces the data from this and other passes 
over the lake and from these maps the lake surface 
temperatures. 



